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A   kTH    NEAREST    NEIGHBOUR   CLUSTERING    PROCEDURE 

M.    Anthony   Wong    and    Torn    Lane 

Sloan   School   of   Management 

Massachusetts    Institute    of   Technology 

Cambridge,    MA      02139 

SUMMARY 
Due    to    the    lack   of   development    in    the    probabilistic    and    statistical 
aspects   of   clustering    research,   clustering    procedures   are   often 
regarded    as   heuristics   generating   artificial   clusters    from   a  given 
set    of    sample   data.      In   this   paper,    a  clustering   procedure   that    is 
useful    for   drawing   statistical    inference    about    the   underlying 
population    from   a  random   sample   is   developed.      It    is   based   on   the 
uniformly   consistent   kth  nearest   neighbour   density   estimate,    and    is 
applicable    to   both   case-by-variable   data  matrices    and    case-by-case 
dissimilarity  matrices.      The    proposed   clustering   procedure    is    shown 
to  be   asymptotically  consistent    for   high-density  clusters    in   several 
dimensions,    and    its    small-sample    behavior    is    illustrated   by 
empirical   examples.      A  real    application   is    also    included    to 
demonstrate   the   practical    utility  of   this   clustering  method. 
Keywords:      CLUSTERING   PROCEDURE;    HIGH-DENSITY  CLUSTERS;    kTH  NEAREST   NEIGHBOUR 
DENSITY   ESTIMATION;    SET -CONSISTENCY. 
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1.   INTRODUCTION 

1. 1   Shortcomings  of  Clustering  Procedures 
A  recent  study  by  Blashfield  and  Aldenderfer  (1978)  shows  that  numerous 
clustering  methods  have  been  developed  in  the  past  two  decades.   A  review  of 
many  of  these  techniques  can  be  found  in  Cormack  (1971),  Anderberg  (1973), 
Sneath  and  Sokal  (1973),  Everitt  (19  74),  Hartigan  (1975),  and  Spath  (1980). 
However,  hardly  any  of  the  originators  of  these  methods  have  approached  the 
clustering  problem  from  within  a  theoretical  framework.   More  often  than  not, 
the  concept  of  a  real  population  cluster  is  vague  and  is  left  undefined. 
Since  no  statistical  evaluation  of  the  sample  clusters  can  be  performed  under 
the  circumstance,  the  validity  of  the  clusters  obtained  by  these  methods  is 
always  questionable.   Consequently,  the  existing  clustering  procedures  are 
often  regarded  as  heuristics  generating  artificial  clusters  from  a  given  set 
of  sample  data,  and  there  is  a  need  of  clustering  procedures  that  are  useful 
for  drawing  statistical  inference  about  the  underlying  population  from  a 
sample.   In  this  paper,  a  clustering  procedure  based  on  the  kth  nearest 
neighbour  density  estimate  is  proposed,  and  it  is  shown  to  be  set-consistent 
for  high-density  clusters  in  several  dimensions.   The  set-consistency  property 
of  a  hierarchical  clustering  procedure  will  be  defined  next . 

1. 2  A  Theoretical  Approach  to  Evaluating  Hierarchical  Clustering  Methods 
In  order  to  evaluate  the  sampling  property  of  a  clustering  method,  it  is 
necessary  to  have  population  clusters  defined  on  population  probability 
density  functions  from  which  the  observations  are  obtained,  and  to  have  some 
ways  of  judging  how  the  sample  clusters  deviate  from  the  population  clusters. 
Let  the  observations  x, ,  x„,  ...,  x   in  p-dimensional  space  be  sampled  from  a 
population  with  density  f,  taken  with  respect  to  Lebesque  measure.   Using  the 


high-density  clustering  model   given   in   Hartigan  (1975,    p.    205),    the   true 

population   clusters   can  be   defined   on   f   as    follows:      a   high-density  cluster   at 

level    f      in   the   population   is   defined   as   a  maximal   connected    set   of   the    form 

{x|f(x)    >   f  }.      The    family   T  of    such   clusters    forms    a   tree,    in   that    At    T,    Be 

T   implies    either   A  ^    B,    B  =>    A,    or   A  n    B    =   4>.      A  hierarchical    clustering 

procedure,    which   produces    a   sample   clustering    tree    \,on   the    observations   x. , 

...,    X    ,   may    then   be    evaluated  by   examining   whether    T^   converges    to   T  with 

probability  one   when  N  approaches    infinity.      A  clustering  method   (or 

equivalently,    T   )    is    said    to   be    strongly    set-consistent    for   high-density 

clusters    (or    T)    if    for    any  A,    B    e  T,    A  n    B   =  4), 

P   {A„    n  B.,  =  t)    as    N  -►  "}    =    1, 
r      N  N 

where  A^  and  B^  are  respectively  the  smallest  cluster  in  the  sample  tree  Tj^ 
containing  all  the  sample  points  in  A  and  B.   Since  A  ^  b  implies  k^  c  b^^, 
this  limit  result  means  that  the  tree  relationship  in  T„  converges  strongly  to 
the  tree  relationship  in  T. 

Using  this  definition  of  consistency,  hierarchical  clustering  methods  can 
be  evaluated  by  examining  whether  they  are  strongly  set-consistent  for 
high-density  clusters.   If  a  clustering  procedure  is  set-consistent,  the 
sequence  of  enlarging  hierarchical  clusters  that  it  produces  in  the  sample  are 
groups  of  points  lying  within  successively  lower  density  contours  in  the 
underlying  distribution.   Hence,  these  sample  high-density  clusters  are  useful 
in  indicating  the  number  of  modal  regions  in  the  population  as  well  as 
identifying  their  locations  in  the  underlying  space.   And  since  it  is  the 
geometrical  shape  of  the  population  density  contours  that  determines  the 
configuration  of  the  sample  high-density  clusters,  a  set-consistent  clustering 
method  does  not  impose  structure  on  the  clusters  it  produces.   (See  Everitt 
(1974,  Chapter  4)  for  some  well-known  clustering  methods  that  impose  a 


spherical    structure   on   the   clusters    they   produce.)      On   the   other  hand,    a 
clustering    procedure    that    is    not    set-consistent    is    not    adaptive    to    the 
underlying  distribution,    and    is   hence   not    suitable    for   detecting   high-density 
or   "natural"    clusters   (see   Carmichael   et .    al . ,    1968). 

Hartigan  (1977a,    1977b,    1979)    has   examined    the    set-consistency  of  many  of 
the   best    known   hierarchical    clustering   methods    for   high-density   clusters.       It 
was    shown    that    the    complete    linkage    (Sorenson    1948)    and    average    linkage 
(Sneath   and   Sokal    1973)   methods   are   not    set-consistent,    while   single    linkage 
(Sneath    1957)    is   weakly    set-consistent    in   one   dimension  but    not    in   higher 
dimensions.      Thus  most    of    the    relevant    evaluative   work   under    the   high-density 
clustering  model    have   been   carried   out.      However,    the    important    problem  of 
developing   clustering    procedures    that    are    set-consistent    for    high-density 
clusters   did   not    receive  much   attention.      In   Hartigan  and   Wong   (1979),    and 
Wong   (1980),    a  hybrid   clustering  method    is   developed   which   is   weakly 
set-consistent    for   high-density   clusters    in   one   dimension;    and,    there    exist 
empirical   evidence   that    similar   consistency  results   hold    in   several 
dimensions.      However,    although    the  hybrid  method   has    the    advantage   of   being 
practicable    for   very   large   data   sets,    it    is   not   well-suited    for    small    samples 
(n    <  100)    and    it    is   only   applicable    to   case-by-variable  data  matrices.      In 
this   paper,   a   strongly   set-consistent   clustering    procedure    is   developed   which 
is    applicable    to    both   case-by-variable   data  matrices    and    case-by-case   distance 
matrices,    and    its   development    is    outlined   next. 

1.3     Development   of   the   kth  Nearest  Neighbour  Clustering  Procedure 

Under    the   high-density   clustering   model,    density   estimates   can   be    used    to 
generate   sample   clusters,    namely   the  high-density  clusters   defined   on   the 
estimates.      And   a  clustering   procedure   is   expected    to   be   set-consistent    for 
high-density  clusters    if   it    is   based   on  a   uniformly   consistent   density 


estimate.   Single  linkage  corresponds  to  nearest  neighbour  density  estimation 
(Hartigan  1977b),  in  which  the  density  estimate  fxjCx)  at  a  point  x  is 
inversely  proportional  to  the  volume  of  the  smallest  closed  sphere  including 
one  sample  point.   This  density  estimate  is  not  consistent  in  the  sense  that 
f„(x)  does  not  approach  f(x)  in  probability.   An  improved  density  estimate, 
and  perhaps  improved  clustering,  can  be  obtained  by  the  kth  nearest  neighbour 
density  estimate:   the  estimated  density  at  point  x  is  f^(x)  =  k/(N  V  (x)), 
where  V  (x)  is  the  volume  of  the  closed  sphere  centered  at  x  containing  k 
sample  points.   Such  a  density  estimate  is  uniformly  consistent  with 
probability  1  if  f  is  uniformly  continuous  and  if  k  =  k(N)  satisfies  k(N)/N  * 
0  and  k(N)/log  N  -►  ».   (See,  for  example,  Devroye  and  Wagner  1977,  and  Moore 
and  Yackel  19  77.) 

Wishart  (1969),  in  an  attempt  to  improve  on  the  single  linkage  clustering 
technique,  developed  a  procedure  entitled  Mode  Analysis  which  is  related  to 
the  kth  nearest  neighbour  density  estimate.   However,  Wishart' s  procedure  was 
not  designed  to  obtain  the  high-density  clusters  defined  on  the  density 
estimate,  and  hence  its  set-consistency  for  high-density  clusters  was  never 
established.   Moreover,  since  its  computational  algorithm  is  quite 
complicated,  the  Mode  Algorithm  did  not  receive  much  attention  in  the 
clustering  literature.   In  this  paper,  a  clustering  algorithm  for  deriving  the 
tree  of  sample  high-density  clusters  from  the  kth  nearest  neighbour  density 
estimate  is  developed.   A  detailed  description  of  this  clustering  procedure  is 
given  in  Section  2.   In  Section  3,  it  is  established  that  the  proposed  method 
is  strongly  set-consistent  for  high-density  clusters.   Empirical  examples  are 
given  in  Section  4  to  illustrate  the  small-sample  behavior  of  kth  nearest 
neighbour  clustering.   A  real  example  is  presented  in  Section  5  to  demonstrate 
the  practical  utility  of  the  proposed  clustering  method. 


2.   A  KTH  NEAREST  NEIGHBOUR  CLUSTERING  PROCEDURE 

The  proposed  nearest  neighbour  clustering  algorithm  consists  of  two 
stages.   At  the  first  stage,  the  kth  nearest  neighbour  density  estimation 
procedure  is  used  to  obtain  a  uniformly  consistent  estimate  of  the  underlying 
density.   The  tree  of  sample  high-density  clusters  defined  on  the  estimated 
density  is  computed  at  the  second  stage  of  the  algorithm.   At  this  latter 
stage,  a  distance  matrix  is  first  computed  in  which  the  distance  between  two 
"neighbouring"  points  (i.e.  points  with  the  property  that  at  least  one  point 
is  one  of  the  kth  nearest  neighbour  of  the  other)  is  defined  to  be  inversely 
proportional  to  a  pooled  density  estimate  at  the  point  halfway  between  them, 
and  the  single  linkage  clustering  algorithm  (Sneath,  1957)  is  then  applied  to 
this  distance  matrix  to  obtain  the  tree  of  sample  clusters. 

2.1   The  Density  Estimation  Stage 

The  kth  nearest  neighbour  density  estimation  procedure  is  used  in  this 
stage  of  the  clustering  procedure  because  it  provides  a  strongly  uniform 
consistent  estimate  of  the  underlying  density.   Let  x,  ,  .  .  .  ,Xj^  be  independent, 
identically  distributed  random  vectors  with  values  in  R  ,  p  ^  1^  and  with  a 
common  probability  density  f.   If  V,  (x)  is  the  volume  of  the  smallest  sphere 
centered  at  x  and  containing  at  least  k  of  the  random  vectors  Xj|^,...,Xj^,  then 
the  kth  nearest  neighbour  density  estimate  of  f  at  x  is 

fj,(x)  =  k/(NVj^(x)) 
And  in  Devroye  and  Wagner  (19  77),  the  following  strong  uniform  consistency 
result  of  this  estimate  is  shown: 
Lemma  (Devroye  and  Wagner,  19  77): 

If  f  is  uniformly  continuous  on  R^  and  if  k  =  k(N)  is  a  sequence  of 
positive  integers  satisfying: 


(a)  k(N)/N  -  0,  and 

(b)  k(N)/log  N  --  «,  as  N  -►  », 
then 

^"P  I  fjj(x)  -  f(x)|  -.-  0  with  probability  1. 
One  purpose  of  the  kth  nearest  neighbour  clustering  method  is  to  discover  the 
population  high-density  clusters  given  a  random  sample  from  some  underlying 
distribution  F  with  density  f.   In  this  first  step  of  the  proposed  procedure, 
a  uniformly  consistent  estimate  of  f  is  obtained.   The  high-density  clusters 
defined  on  the  estimated  density  f„  can  then  be  used  as  sample  estimates  of 
the  population  high-density  clusters  defined  on  f.   These  hierarchical  sample 
high-density  clusters  are  constructed  in  the  second  stage  of  the  proposed 
clustering  algorithm. 

2. 2  The  Hierarchical  Clustering  Stage 

In  this  stage,  a  distance  matrix  D(x.,x.),  1  ^  i,  j  ^  N,  for  the  N 
observations  is  first  computed  using  the  following  definitions: 

Definition  1:   Two  observations  x.  and  x.  are  said  to  be  neighbours  if 
d*(x.,x.)  <  d,  (x.)  or  d,(x.),  where  d*  is  the  Euclidean  metric  and  d,(x.)  is 
the  kth  nearest  neighbour  distance  to  point  x.. 

Definition  2:   The  distance  D(»,*)  between  the  observations  x.  and  x. 
is 

D(x.,Xj)  =  (l/2)[l/fj^(x.)  +  l/fj^(x.)]  =  |i^[Vj^(x.)  +  Vj^CxJ],  if  x.  and  x^  are 

neighbors ; 

=  °° ,  otherwise  . 

Hence,  finite  distances  are  defined  only  for  pairs  of  observations  which 

are  in  the  same  neighbourhood  in  R^,  and  the  defined  distance  between  a  pair 

of  neighbouring  observations  is  inversely  proportional  to  a  pooled  density 

estimate  at  the  point  halfway  between  them.   The  following  single  linkage 


clustering  technique  is  then  applied  to  this  distance  matrix  D  to  obtain  the 
tree  of  sample  high-density  clusters. 

Given  a  set  of  observations  of  objects  x.,  . . . ,  x„  with  distances 
D(x.,x.),  1  ^  i  <  j  ^  N,  single  linkage  clusters  are  defined  as  follows:   let 
X-  and  X.  be  the  closest  pair  of  objects;  amalgamate  them  to  form  a  cluster  c 
and  define  the  distance  between  that  cluster  and  any  object  J^  be  D(c,x  )  =  min 
[D(x. ,x  ),D(x.,x  )];  repeat  the  process  treating  c  as  an  object  and  ignoring 
X.  and  X..   The  amalgamation  continues  until  all  objects  are  grouped  in  one 
large  cluster.   All  clusters  obtained  in  the  course  of  this  hierarchical 
algorithm  are  single  linkage  clusters.   (See  Gower  and  Ross  (1969),  and 
Hartigan  (19  75)  for  computational  single  linkage  algorithms.)   Single  linkage 
clustering  is  used  in  this  step  of  the  proposed  procedure  because  it  has  the 
following  property:   at  every  stage  of  the  clustering,  the  single  linkage 
clusters  are  the  maximal  linked  sets  if  objects  x^  and  x.  are  said  to  be 
linked  whenever  D(x.,x.)  is  no  greater  than  a  given  distance  D  .   Now,  since 


the  distance  D  between  two  "neighboring"  observations  is  reciprocal  to  the 
density  estimate  f^,  at  the  midpoint  between  them,  every  cluster  obtained  by 
applying  single  linkage  to  D  has  the  property  that  the  density  estimates  over 
the  objects  in  this  cluster  are  greater  than  a  certain  density  level  f  . 
Moreover,  as  the  distance  measure  D  is  defined  only  for  pairs  of 
"neighbouring"  observatons,  the  resultant  single  linkage  clusters  correspond 
to  maximal  connected  sets  of  the  form  {x|fj^(x)  ^  f^},  which  are  the 
high-density  clusters  defined  on  f„. 

2. 3   The  Computational  Algorithm 
Since  high-density  clusters  are  invariant  to  monotone  transformations  of 
the  density  function,  the  kth  nearest  neighbour  distances  dj^(x^),  i  =  1,  ..., 
N  are  used  instead  of  the  V,(x.)'s  in  the  following  computational  algorithm  of 


the  kth  nearest  neighbour  clustering  procedure: 

STEP  1:   For  i  =  1,  2,  ...,  N,  compute  d,(x.),  the  kth  nearest  neighbour 

distance  of  x- .   (For  a  computationally  efficient  algorithm  to  find 
the  kth  nearest  neighbour  distances,  see  Friedman  et .  al . ,  19  75.) 

STEP  2:   Compute  the  distance  matrix  D  as  follows: 
D(x.,x.)  =  (l/2)[d,(x.)  +  d,(x.)]  if 

1        J  K        1  "^        J 

d*(x.,x.)    <   d,  (x.)    or   d,(x.), where   d*    is    the   Euclidean 
i'    J  k      1  k      J    ' 

metric  ; 
=     ",    otherwise. 
STEP   3:      Apply   the    single    linkage    clustering    algorithm   to    the   computed 

distance  matrix   D   to   obtain   the   sample    tree   of   high-density  clusters. 
The   computational    requirements    for   STEP   1   and   STEP   3  are   0(p   N  log   N)    and 
0(nk)    respectively.      Hence,    unlike   the  hybrid   clustering  method    (Hartigan   and 
Wong,    1979    and   Wong,    1980),    this   procedure   is   not   practicable    for    large   data 
sets;    but,    it    is   better    suited    for   small    samples,    and    is    applicable   to   both 
case-by-variable   data  matrices    and   case-by-case   dissimilarity  matrices. 

3.       STRONG    SET -CONSISTENCY    OF   kTH   NEAREST    NEIGHBOUR   CLUSTERING 
The    asymptotic    consistency  of    the    kth   nearest    neighbour    clustering   method 
for   high-density   clusters    in    R^,    p  >     1,    is   given    in    the    following    theorem: 


Theorem:      Let    f  denote   a   positive,    uniformly  continuous    function   on    R      such 

that    {x|f(x)    ^   f    }      is    the    union   of   a    finite   number    of   compact    subsets   of    R 
o 

for   every   f     >    0.      Let   T  be   the   tree   of   population  high-density  clusters 
defined   on   f.      Supose    that   A  and   B   are   any   two  disjoint   high-density  clusters 
in   T  with   connected    interiors.      Let    x,  ,    . .  .  ,    Xj^  be    a   random    sample    from    f   and 
let    T  be    the   hierarchical    clustering    specified  by    the    kth   nearest    neighbour 


10 


clustering  algorithm.   Then,  provided  that  k.  =  k(N)  satisfies 

(a)  k.(N)/N  ♦  0,  and 

(b)  k(N)/log  N  *  ». 

as  N  -  »,  there  exist  Aj^,  B^^  e  T^  with  A^^  ^  A^^  n  {x^ ,  .  .  .  ,Xj^}  ,  B^^  :^  B  n 
{x,,...,Xj,}  and  A   n  Bj.  =  <^  with  probability  1. 

Proof:   Since  T   is  the  tree  of  high-density  clusters  for  f„,  this  theorem  is 
a  direct  consequence  of  the  Lemma,  which  states  that 

^xP  |fj^(x)  -  f(x)|  -  0,  with  probability  1.  (3.1) 

By  definition,  for  any  two  disjoint  high-density  clusters  A  and  B  in  T,  there 
exist  6  >  0,  c  >  0  and  X  >  0,  such  that 

(i)  f(x)  >  X  for  all  X  e  A  u  B,  and  (3.2) 

(ii)  each  rectilinear  path  between  A  and  B  contains  a  segment,  with 

length  greater  than  6,  along  which  the  density  f(x)  <  X  -  3e .    (3.3) 
From  (3.1),  we  have  for  N  large, 
*xP  |fj^(x)  -  f(x)|  <  e   w.p.  1. 
Thus,  it  follows  fromm  (3.2)  and  (3.3)  that  for  N  large,  with  probability  1, 

(iii)  fxj(x)  >  X  -  e  for  all  x  e  A  u  B,  and 

(iv)   each  rectilinear  path  between  A  and  B  contains  a  segment,  with 

length  greater  than  6,  along  which  the  density  estimate  fM(x)  ^  X  - 
2e. 

Since  A  and  B  are  disjoint,  it  follows  from  (3.4)  and  (3.5)  that 
high-density  clusters  of  the  form  {x|fj^.(x)  >  X  -  e}  separate  the  observations 
in  A  and  B.   The  theorem  follows. 


XX 


4.   EMPIRICAL  STUDY  OF  THE  SMALL-SAMPLE  BEHAVIOR  OF  THE  kTH  NEAREST  NEIGHBOUR 
CLUSTERING  PROCEDURE 

To  illustrate  the  small-sample  behavior  of  the  kth  nearest  neighbor 
clustering  procedure,  an  empirical  study  was  performed  in  which  the  procedure 
is  applied  to  various  generated  data  sets.   Results  of  three  experiments,  in 
which  bivariate  data  were  used,  are  reported  here. 

1.  Experiment  One:  30  observations  were  generated  so  that  two  spherical 
clusters  of  observations  are  present  in  this  data  set.   The  scatter-plot  of 
this  sample  set  is  shown  in  Figure  la,  in  which  the  observation  numbers  are 
plotted  next  to  the  observations.   This  data  set  is  useful  for  illustrating 
the  effectiveness  of  the  proposed  procedure  in  identifying  spherical  clusters. 
The  dendrogram  giving  the  hierarchical  clustering  obtained  by  the  kth  nearest 
neighbour  method  (using  k  =  4)  is  shown  in  Figure  lb.   It  is  clear  that,  in 
this  experiment,  the  kth  nearest  neighbour  clustering  indicates  the  presence 
of  two  modal  regions  of   clusters. 

However,  for  a  choice  of  k  which  is  much  too  small,  the  dendrogram 
produced  by  the  hybrid  method  would  tend  to  suggest  the  presence  of  a  few 
extra  modal  regions.   Tne  reason  for  this  moderate  sensitivity  of  the  proposed 
method  to  the  choice  of  k  is  that,  if  k  is  too  small,  extra  modes  tend  to 
appear  in  the  kth  nearest  neighbour  density  estimate,  and  these  bumps  in  the 
estimated  density  function  are  identified  as  modal  regions  in  the  hierarchical 
clustering  stage  of  the  algorithm. 

2.  Experiment  Two:   58  observations  were  generated  so  that  two  elongated, 
elliptical  clusters  of  observations  are  present  in  this  data  set.   The  scatter 
plot  of  this  sample  set  is  shown  in  Figure  2a,  in  which  the  observation 
numbers  are  plotted  next  to  the  observations.  This  data  set  is  useful  for 


12 


illustrating  the  effectiveness  of  the  proposed  clustering  procedure 
inidentifying  non-spherical  clusters.   The  dendrogram  giving  the  hierarchical 
clustering  obtained  by  the  kth  nearest  neighbour  method  (using  k  =  4)  is  shown 
in  Figure  2b.   Two  disjoint  modal  regions,  corresponding  to  the  two  elliptical 
clusters  of  observations  shown  in  Figure  2a  can  be  identified  in  this 
dendrogram.   However,  observations  51,  58,  and  37  form  a  minor  modal  region 
within  one  of  the  two  clusters;  and,  observations  22,  23,  and  2  form  a  minor 
modal  region  in  the  other  cluster. 

3.   Experiment  Three:   60  observations  were  generated  so  that  two  spherical 
clusters  of  observations  are  present  in  this  sample  and  they  are  connected  by 
a  chain  of  "noise"  observations  (see  Figure  3a).   This  data  set  is  useful  for 
demonstrating  the  effectiveness  of  the  proposed  method  when  a  moderate  amount 
of  noise  is  present  in  the  sample.   The  hierarchical  clustering  obtained  by 
the  kth  nearest  neighbour  method  (using  k  =  4)  is  shown  in  Figure  3b.   It  can 
be  seen  that  the  two  spherical  clusters  are  recovered  by  the  proposed  method 
as  modal  regions,  in  spite  of  the  presence  of  the  noise  observations. 

5.  A  REAL  EXAMPLE 
In  order  to  illustrate  how  the  hybrid  clustering  method  works  in 
practice,  it  is  applied  to  the  well-known  Iris  data  given  in  Fisher  (1936). 
The  data  consist  of  four  characteristics  for  three  species  of  Iris;  the 
species  are  Iris  Setosa,  Iris  Versicolor  and  Iris  Virginica,  and  the 
characteristics  are  sepal  length,  sepal  width,  petal  length,  and  petal  width. 
There  are  fifty  samples  from  each  species,  and  hence  the  total  sample  size  is 
150.   This  data  set  has  been  used  by  many  authors  to  test  the  practical 
utility  of  various  clustering  algorithms  (e.g.  Friedman  and  Rubin,  1967).   It 
has  been  found  that  there  are  two  distinct  clusters  of  samples  in  this  data 
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set;  one  corresponding  to  the  samples  from  Iris  Setosa,  and  the  other 
corresponding  to  samples  from  the  other  two  species.   Moreover,  the  samples 
from  Iris  Versicolor  and  Iris  Virginica  form  a  somewhat  homogeneous  group  and 
there  is  no  clearcut  distinction  between  samples  from  these  two  different 
species  (see,  for  example,  Fisher,  1936;  Friedman  and  Rubin,  1967;  and 
Gnanadesikan,  1977). 

The  hierarchical  clustering  obtained  by  applying  the  kth  nearest  neighbour 
method  is  this  data  set,  using  a  value  of  k  =  8,  is  shown  in  Figure  4.   Two 
distinct  modal  regions  can  be  identified  in  this  sample  tree  of  high-density 
clusters;  one  corresponding  to  the  samples  from  Iris  Setosa,  and  the  other 
corresponding  to  the  samples  from  Iris  Versicolor  and  Iris  Virginica. 
Moreover,  within  the  Versicolor-Virginica  modal  region,  there  are  two 
sub-modal  regions,  one  such  region  is  consisted  of  samples  only  from  Iris 
Versicolor,  while  the  other  region  is  consisted  of  samples  only  from  Iris 
Virginica.   However,  it  should  be  pointed  out  that  if  k  was  chosen  to  be  much 
longer  than  8  (say,  k  =  12  or  15),  the  two  sub-modal  regions  would  not  appear 
in  the  hierarchical  clustering  and  only  the  two  well-known  distinct  clusters 
can  be  identified  by  the  kth  nearest  neighbour  method. 
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Fig.  la.   Scatter-plot  of  the  generated  bivariate  sample  (N  =  30) 

used   in  Experiment  One.   Observation  numbers  are  plotted 
next  to  the  observations. 
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Fig.  lb.   Tree  of  sample  high-density  clusters  for  data  shown  in  Fig.  la, 
derived  from  the  kth  nearest  neighbour  density  estimate  using 
k  =  4. 
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Fig.  2  a   Scatter-plot  of  the  generated  bivariate  sample  (N  =  58) 
used  in  Experiment  Two.   Observation  numbers  are  plotted 
next  to  the  observations. 
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Tree  of  sample  high-density  clusters  for  data  shown  in  Figure  2a, 
derived  from  the  kth  nearest  neighbour  density  estimate  using  k  =  4. 
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Fig.  3a   Scatter-plot  of  the  generated  bivariate  sample  (N  =  60)  used  in 
Experiment  3.   Observation  numbers  are  plotted  next  to  the 
observations . 
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Tree  of  sample  high-density  clusters  for  data  shown  in  Fig.  3a,  derived  from 
the  kth  nearest  neighbour  density  estimate  using  k  =  4. 
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